35 research outputs found

    Conditional Anomaly Detection with Soft Harmonic Functions

    Get PDF
    International audienceIn this paper, we consider the problem of conditional anomaly detection that aims to identify data instances with an unusual response or a class label. We develop a new non-parametric approach for conditional anomaly detection based on the soft harmonic solution, with which we estimate the confidence of the label to detect anomalous mislabeling. We further regularize the solution to avoid the detection of isolated examples and examples on the boundary of the distribution support. We demonstrate the efficacy of the proposed method on several synthetic and UCI ML datasets in detecting unusual labels when compared to several baseline approaches. We also evaluate the performance of our method on a real-world electronic health record dataset where we seek to identify unusual patient-management decisions

    Conditional Anomaly Detection Using Soft Harmonic Functions: An Application to Clinical Alerting

    Get PDF
    International audienceTimely detection of concerning events is an important problem in clinical practice. In this paper, we consider the problem of conditional anomaly detection that aims to identify data instances with an unusual response, such as the omission of an important lab test. We develop a new non-parametric approach for conditional anomaly detection based on the soft harmonic solution, with which we estimate the confidence of the label to detect anomalous mislabeling. We further regularize the solution to avoid the detection of isolated examples and examples on the boundary of the distribution support. We demonstrate the efficacy of the proposed method in detecting unusual labels on a real-world electronic health record dataset and compare it to several baseline approaches

    VESsel GENeration Analysis (VESGEN): Innovative Vascular Mappings for Astronaut Exploration Health Risks and Human Terrestrial Medicine

    Get PDF
    Currently, astronauts face significant health risks in future long-duration exploration missions such as colonizing the Moon and traveling to Mars. Numerous risks include greatly increased radiation exposures beyond the low earth orbit (LEO) of the ISS, and visual and ocular impairments in response to microgravity environments. The cardiovascular system is a key mediator in human physiological responses to radiation and microgravity. Moreover, blood vessels are necessarily involved in the progression and treatment of vascular-dependent terrestrial diseases such as cancer, coronary vessel disease, wound-healing, reproductive disorders, and diabetes. NASA developed an innovative, globally requested beta-level software, VESsel GENeration Analysis (VESGEN) to map and quantify vascular remodeling for application to astronaut and terrestrial health challenges. VESGEN mappings of branching vascular trees and networks are based on a weighted multi-parametric analysis derived from vascular physiological branching rules. Complex vascular branching patterns are determined by biological signaling mechanisms together with the fluid mechanics of multi-phase laminar blood flow

    A Prototype-driven Framework for Change Detection in Data Stream Classification

    No full text
    Abstract- This paper presents a prototype-driven framework for classifying evolving data streams. Our framework uses cluster prototypes to summarize the data and to determine whether the current model is outdated. This strategy of rebuilding the model only when significant changes are detected helps to reduce the computational overhead and the amount of labeled examples needed. To improve its accuracy, we also propose a selective sampling strategy to acquire more labeled examples from regions where the model’s predictions are unreliable. Our experimental results demonstrate the effectiveness of the proposed framework, both in terms of reducing the amount of model updates and maintaining high accuracy. 1

    Abstract

    No full text
    Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its efficiency for real-world applications. First, it is computationally expensive and difficult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose “generalized maximum margin clustering ” framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm by allowing any clustering boundaries including those not passing through the origins. It significantly improves the computational efficiency by reducing the number of parameters. Furthermore, the new framework is able to automatically determine the appropriate kernel matrix without any labeled data. Finally, we show a formal connection between maximum margin clustering and spectral clustering. We demonstrate the efficiency of the generalized maximum margin clustering algorithm using both synthetic datasets and real datasets from the UCI repository.

    Learning classification with auxiliary probabilistic information

    No full text
    Abstract—Finding ways of incorporating auxiliary information or auxiliary data into the learning process has been the topic of active data mining and machine learning research in recent years. In this work we study and develop a new framework for classification learning problem in which, in addition to class labels, the learner is provided with an auxiliary (probabilistic) information that reflects how strong the expert feels about the class label. This approach can be extremely useful for many practical classification tasks that rely on subjective label assessment and where the cost of acquiring additional auxiliary information is negligible when compared to the cost of the example analysis and labelling. We develop classification algorithms capable of using the auxiliary information to make the learning process more efficient in terms of the sample complexity. We demonstrate the benefit of the approach on a number of synthetic and real world data sets by comparing it to the learning with class labels only. Keywords-classification learning; sample complexity; learning with auxiliary label information I

    Ranking refinement and its application to information retrieval

    No full text
    We consider the problem of ranking refinement, i.e., to improve the accuracy of an existing ranking function with a small set of labeled instances. We are, particularly, interested in learning a better ranking function using two complementary sources of information, ranking information given by the existing ranking function (i.e., the base ranker) and that obtained from users ’ feedbacks. This problem is very important in information retrieval where feedbacks are gradually collected. The key challenge in combining the two sources of information arises from the fact that the ranking information presented by the base ranker tends to be imperfect and the ranking information obtained from users’ feedbacks tends to be noisy. We present a novel boosting algorithm for ranking refinement that can effectively leverage the uses of the two sources of information. Our empirical study shows that the proposed algorithm is effective for ranking refinement, and furthermore it significantly outperforms the baseline algorithms that incorporate the outputs from the base ranker as an additional feature
    corecore